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ABSTRACT 

The mid-infrared spectra of ultraluminous infrared galaxies (ULIRGs) contain a va- 
riety of spectral features that can be used as diagnostics to characterise the spec- 
tra. However, such diagnostics are biased by our prior prejudices on the origin of 
the features. Moreover, by using only part of the spectrum they do not utilise the 
full information content of the spectra. Blind statistical techniques such as principal 
component analysis (PCA) consider the whole spectrum, find correlated features and 
separate them out into distinct components. 

We further in vestigate the principal components (PCs) of ULIRGs derived in 
I Wang et al.l (|2011f ). We quantitatively show that five PCs is optimal for describing 
the IRS spectra. These five components (PC1-PC5) and the mean spectrum provide 
a template basis set that reproduces spectra of all z < 0.35 ULIRGs within the noise. 
For comparison, the spectra are also modelled with a combination of radiative transfer 
models of both starbursts and the dusty torus surrounding active galactic nuclei. The 
five PCs typically provide better fits than the models. We argue that the radiative 
transfer models require a colder dust component and have difficulty in modelling strong 
PAH features. 

Aided by the models we also interpret the physical processes that the principal 
components represent. The third principal component is shown to indicate the nature 
of the dominant power source, while PCI is related to the inclination of the AGN 
torus. 

Finally, we use the 5 PCs to define a new classification scheme using 5D Gaussian 
mixtures modelling and trained on widely used optical classifications. The five PCs, 
average spec tra for the four classifications and the code to classify objects are made 
available at: |http : / /www . phys . susx . ac . uk/~pdri21/P CA/| 
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1 INTRODUCTION 

Ultraluminous Infrared Galaxies (ULIRGs) are galaxies 
whose rest-frame infrared luminosities, Z/8-iooonm, exceed 
10 12 L Q . Although ULIRGs were first discovered using 
ground based photometry in the 1970s (|Rieke fc Lowll 19721 ). 
the IRAS survey transformed our under standing by observ - 
ing the objects in much larger numbers (|Soifer et al.lll984l ). 
Most have high star-formation rates (SFR > lOOMQyr -1 ), 
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while around half also contain an embedded Active Galactic 
Nucleus (AGN). 

ULIRGs are rare in the local Universe, with less than 
fifty at z < 0.1, but the associated luminosity function 
show s strong, positive evolution with redshift (e.g. I Sanders! 
Il999l) . resulting in several hundred ULI RGs per square de- 
gree at z > 1 (|Row an-Robinson et all 1 19971 : iBarger et al.l 
19981: iHughes et al.lll99g| ; lEales et al.ll2000l : IFox et al.ll2002f 



Le Floc'h et alj|2005l ~ The increase in number density with 



redshift and their associated high SFR means ULIRGs make 
a significant contribution to the history of star formation at 
high z. 
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The mid to far infrared luminosity of ULIRGs is a re- 
sult of dust and gas reprocessing the optical and UV radia- 
tion emitted by stars and/or AGN. Obtaining spectroscopy 
for the mid-infrared part of the spectrum became possi- 
ble with i nstruments su c h as the Infrared Space Observa- 
tory (ISO: iKessler et all (Il996l) ). and the Infrared Spectro- 
graph (IRS;\Rouck et al.l j2004h ) on the Spitzer Space Tele- 
scope (|Werner et all 2004 ). The ULIRG spectra from these 
instruments contain a wealth of spectral features. These 
include the emission lines from broad polycyclic aromatic 
hydrocarbons (PAHs), which are strong in star forming re- 
gions , but absent in A GN dominated sources l)Moorwoodl 
ll98rj ; lRoche et al.lli"99lh . A prominent [Ne V] 14.3 fj,m fine 
structure line indicates the presence of an AGN, while the 
silicate features at 9 .7 and 18 fim probe source geometry 
jlmanishi et al.ll2007l ). 

Combinations of the PAH emission lines, mid-infrared 
fine-structure lines and silicate features have been used 
as diagnostics for characterising t he power source behind 
the ULIRGs (iGenzel et all Il998l; iRigopoulou et all 1 19991 ; 



ISpoon et al.1 120071 ; iFarrah et al.M2007t l200Sl . 12009^ There 
are however problems associated with these diagnostic tools, 
such as the separation of emission lines from both the con- 
tinuum and underlying PAH features, the mixture of neigh- 
bouring features and different diagnostics giving conflicting 
estimates. They also only focus on small parts of the spec- 
trum, disregarding the information contained in the remain- 
der. 

Larger regions of the spectrum can be investigated with 
the multivariate statistic, Principal Component Analysis 
(PC A). PCA has bee n used for spectral c l assification for 
optic a l galaxies (e . g. Connolly et al.l 1 19951 : iBromlev et al.l 
1 19981 ) . I Wang et al.l (20111 ) carried out PCA on the IRS spec- 
tra of 119 local ULIRGs. They argued, qualitatively, that 
only 4 principal components (PCs) were needed to reproduce 
the variance in the ULIRG spectra. They also proposed that 
the contribution from each PC had some underlying physical 
interpretation. Examination of the first four PCs, and com - 
pari sons to the diagnosti cs employed bv lSpoon et al.l (|2007l ) 
and lNardini et al. (2009) suggested that PCI constrains the 
dust temperature and geometry of the distribution of source 
and dust, while PC2 and PC3 determine the amount of star 
formation. The fourth PC is important for Seyfert Type 2 
galaxies, and is hence a possible indicator of an unobscured 
AGN. 

In this paper we extend lWang et all |201lf) by quantita- 
tively investigating how many PCs are needed to explain the 
variation in the spectra and compare the PC reconstructions 
to fits provided by a suite of radiative transfer models. We 
investigate what information the radiative transfer models 
are missing. We also re-examine what physical properties are 
behind the PCs, by investigating the relationship between 
the physical parameters of models and the contributions 
from different PCs. Finally, we introduce a new classification 
scheme using 5D Gaussian mixtures modelling and trained 
with optical classifications. Section [2] gives an overview of 
the data and Section|3]a brief description of PCA. Section[4] 
will review the radiative transfer models being applied, and 
Section [S] will present the results. Conclusions will be pre- 
sented in Section [6] We assume a spatially flat cosmology 
with H = 70kms" 1 Mpc" 1 , = 1, and Q m = 0.3. 



2 THE DATA 

T his paper us e s the same sample of mid-infrared spectra 
as I Wang et al.l (|201ll ). We summarise their selection criteria 
here. The ULIRGs w ere observed as part of the IRS Guaran- 
teed Time program (|Armus et al. I l2007l : IFarrah et al.ll2007l; 
ISpoon et al.l 120071 ) and those observed by llmanishi et al.l 
|2007t ). An upper redshift cut of z = 0.35 was applied to 
ensure we sample approximately the same wavelength range 
for each object. A further eight objects were removed as 
they have poor-quality data in the longer-wavelength IRS 
module. In total, there are 119 objects in the sample. 



3 PRINCIPAL COMPONENT ANALYSIS 
(PCA) 

PCA works by determining the eigenvectors from the co- 
variance matrix of a given dataset. For 119 spectra, each 
with 180 wavelength points, the 180 by 180 covariance ma- 
trix quantifies the correlation between each spectral point. 
The eigenvectors of the matrix can be thought of as spectral 
components that can be linearly combined to reconstruct 
each object in the sample. 

Any spectrum can be linearly decomposed by projecting 
it onto the principal components defined by the 119 ULIRG 
sample. This allows each spectrum to be described by the 
contribution from each PC. These contributions define co- 
ordinates in a multidimensional space which we refer to as 
PCA space. 



4 RADIATIVE TRANSFER MODELS 

To compare with the fits provided by the principal compo- 
nents, we have carried out a minimum chi squared search for 
lin ear combinations of a gr i d of st arburst models described 
in ISiebenmorgen fc Kriigel J2007I) and grid of AGN dusty 
torus models of lEfstathiou fc Rowan-Robinson! (|l995l ). The 
libraries contain 5948 and 2109 SEDS respectively and we 
have considered linear combinations of each AGN and star- 
bust SED, giving us many models to search over. 



4.1 Starburst Models 



We use the ISiebenmorgen fc Kriigel (2007) starburst mod - 
els. The models presented by Siebenmorgen fe Kriigell (|2007l ) 
have been described as 'hot spot' starbursts. OB stars are 
assumed to be surrounded by dense clouds (the hot spots) 
and other stars, such as old bulge stars or massive stars are 
dispersed in the diffuse medium. It is the hot spots that con- 
tribute to the mid infrared part of the spectrum. The outer 
radius of these environments is determined by the condition 
of equal heating of the dust by the OB stars in the centre 
and the interstellar radiation field. 

Both stellar groups are treated as continuously dis- 
tributed sources, and the number density of both types of 
stars, falls off as r _1 . 

The parameters of these models include the starburst 
radius, R; ratio of the luminosity of OB stars with hot spots 
to total luminosity, /ob; the total luminosity of the star- 
burst, Lsb', total extinction from the outer radius of the 
galactic nucleus to its centre, A v ; and dust density of the hot 
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Parameter 


Range 


R (kpc) 


0.35, 1 and 3 


f0B 


0.4, 0.6 and 0.9 


Lsb (Lq) 


10 10 to 10 14 in steps of 0.1 dcx 


A v (Mag) 


2.2, 4.5, 7, 9, 18, 35, 70 and 120 


n hs (cm" 3 ) 


10 2 , 10 3 , 2.5 • 10 3 , 5 ■ 10 3 , 7.5 ■ 10 3 , 10 4 



Table 1. Parameter values and ranges for the starburst models. 



Parameter 


Range 


T 


500, 750, 1000, 1250 


(degrees) 


30, 45, 60 


nnAout 


20, 60, 100 


9 (degrees) 


to 90 with either 




40 or 75 divisions (depending on Hn/^out) 



Table 2. Parameter values and ranges for the AGN models. 



spot environment, phs, corresponding to hydrogen number 
densities (uhs) and assuming a gas to dust ratio of 150. 
The parameter ranges can be found in Table [T] In total, the 
library contains 5948 SEDs. 



4.2 AGN torus models 

This paper uses the AGN t apered 
lEfstathiou fc Rowan-Robinson! (|l995l ). 
di sc models, in combina tion with the 
of lEfstathiou et alj (|200fj| ). have been successful in fitting 
the spectr al energy dist ri butio ns of ultraluminous infrared 



disc models of 
The tapered 
starburst models 



galaxies dFarrah et al. _ 2003|). hyperluminous infrared 
galax ies (|Farrah et alj|2002l : IVerma et al.ll2002l; lEfstathiou! 



200E ) submillimeter galaxies (lEfstathiou &c Siebenmorgenl 
2009 ) , and active gala xies ( Alexander et al 



Efstathiou feSiebenmorgenl 12005 : iFarrah et al 



1999 



2012 



Ruiz et aU 2001). The torus is modelled as a disc, whose 
thickness increases with distance from the central source 
but tapers off in the outer regions of the torus. The dust 
density is distributed smoothly within the disc and follows 
a r^ 1 relation, with r being radius. The parameters for the 
AGN torus model are: ultraviolet equatorial optical depth 
to the centre of the torus, r; the opening angle of the torus, 
Q; the ratio of inner to outer radius of the torus, ri n /r out ; 
and the viewing angle, 9. In total, there are 2109 AGN 
SEDs. 



4.3 The fitting procedure 

We have considered all linear combinations of a starburst 
and AGN model when fitting the observed spectra of the 119 
ULIRG sample. We use the wavelength grid of the starburst 
models, and the lower resolution AGN models are interpo- 
lated onto the same grid. The smoothness of the AGN mod- 
els, makes the interpolation justifiable. The radiative trans- 
fer models lack molecular hydrogen emission so we mask out 
regions of the spectrum where molecular hydrogen features 
occur ( i.e. 9.46 - 9.86, 12.08 - 12.48 and 16.83 - 17.23/xm). 

The wavelength resolution of the PCs is higher than 
the starburst model resolution. For proper comparison to 




10 

No. of PCs 



100 



Figure 1. The median variation of the xt f° r the PC reconstruc- 
tion as the number of components used in the reconstruction are 
increased. The dashed line indicates the median Xu f° r the radia- 
tive transfer model fits. For PC reconstructions using up to 10 
PCs (and 20, 30, 40) we also plot the xl f° r every object (offset 
for clarity) 



the fits, and to allow decomposition of the models into PCA 
space, we have re-derived the principal components for the 
ULIRG sample at the resolution of the starburst models. 
There is no significant change in the shape of components. 
We also note that the sign of the PC contributions for each 
object, remains the same and the change in magnitude of 
the PC contributions is not significant in comparison to the 
spread of contributions for the sample. 

T o remain consistent with the analysis of IWang et al.l 
(2011). the models are normalised so that the mean flux 
over the whole wavelength range is unity. 

We then carry out a linear least squares fit for each com- 
bination of starburst and AGN model, with the condition 
that the fit parameters are positive (i.e. to eliminate the pos- 
sibility of a negative amount of starburst or AGN). Model 
comparison is then carried out via minimum chi squared 

(x 2 ). 

We assumed a minimum of 5% flux error for each spec- 
tral bin of the IRS spectra, which is consistent with the 
observed variations between individual nod positions on the 
IRS as described in Chapter 7 of the IRS Instrument Hand- 
book 



5 RESULTS 

5.1 Optimum number of components 

We first investigate how many PCs are needed to describe 
the ULIRG sample. IWang et al.l (|201ll ) did not quantita- 
tively show whether 4 PCs were sufficient. Using the PCs 
re-derived at the lower resolution described in Section 14.31 
we have investigated how many PCs are needed to accu- 
rately reconstruct the IRS spectra of all 119 ULIRGs in 
the sample. For each spectrum, we quantify the goodness 
of reconstruction with the reduced chi squared statistic x2> 
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5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5 25.0 



X (micron) 

Figure 3. The mean spectrum and principal components for the sample of ULIRGs. The dot-dashed vertical lines mark the central 
location of the 6.2,7.7,8.6,11.2 and 12.7 fim PAH emission lines. The dotted lines indicate the location of the molecular hydrogen lines 
at 9.66, 12.28 and 17.03 (im. The solid vertical lines indicate the position of the neon fine-structure lines, [Ne II] 12.8, [Ne v] 14.3 and 
[Ne III] 15.6 urn. 
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Figure 2. The eigenvalues (solid line) and difference in eigen- 
values (dotted line) for the PCs. The eigenvalues quantify the 
variance associated with each PC, and are a measure of impor- 
tance. The difference between eigenvalues drops dramatically for 
the first few PCs, but levels off beyond 5 (indicated by the dashed 
line). We therefor e argue that 5 PCs is a more suitable number 
than the 4 used in lWang et alj feulll) . 



where the number of degrees of freedom is equal to the num- 
ber of wavelength points minus the number of PCs used in 
the reconstruction. 

Figure [1] shows that as we increase the number of PCs 
used in the reconstruction, the median xt value for the sam- 
ple decreases. We have plotted the xt f° r each individual ob- 
ject for reconstructions using up to ten PCs and the median 
xt value obtained by fitting the ULIRGs with the radiative 
transfer models as described in Section T4. 31 Ten PCs would 
appear to be the optimal nu mber i.e. wh e re xt = 1- We find 
that four PCs (assumed by IWang et al.l (|201ll )') give a me- 
dian xt °f 3.3, while adding a fifth component substantially 
decreases the median xt to 2.1. The use of six and seven 
PCs only reduces the median xt to 1.8 and 1.6 respectively. 

The eigenvalues associated with each PC are a measure 
of the variance each PC accounts for and provide an alter- 
native method to determine the optimum number of compo- 
nents. In Figure [2l we plot the eigenvalues and difference in 
eigenvalues for the PCs. The general trend indicates the dif- 
ference between eigenvalues significantly decreases with each 
component. The exception to the rule occurs between the 
3rd-4th component and the 5th-6th component where the 
difference between eigenvalues is larger than the trend. We 
associate this larger than expected difference as in indication 
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Figure 4. An example of our fit with 14252-1550. The radiative 
transfer model is plotted with a dashed line, and the principal 
component reconstruction with 5 PCs is shown with a dotted 
line. The residual over error is also shown to indicate where either 
technique may be failing 



that the previous component captures significantly more in- 
formation than the next. This suggests that the third and 
fifth PC are substantially more important than the fourth 
and sixth respectively. Beyond the sixth PC, the trend flat- 
tens out, indicating most of the variation related to structure 
has been captured. Overall, Figures [T] and [2] do not defini- 
tively indicate the optimum number of PCs. However, we 
argue that the reduction in Xu to 2.1 and difference in eigen- 
value between the fifth and sixth PC, indicates that fi ve PCs 
rather than the four PCs used by I Wang et ail (|201ll) , strike 
a better balance of providing a small basis set of templates, 

whilst adequately describing the spectra. 

The fifth component was not discussed in IWang et al.l 

|201ll ) and so we now show this component, compared to the 
original four. The mean spectra of the 119 ULIRGs and the 
5 components can be seen in Figure There are a number 
of spectral features in this fifth component, most notably 
the 6.2, 11.2 and 12.7 /^m PAH emission lines as well as the 
molecular hydrogen emission line at 17.03 /im. The 6.2 /an 
emission feature has negative flux, while the 11.2 and 12.7 
PAH lines are both positive. Overall, the fifth component 
does not contain any new features that were not seen in the 
previous components. Its role appears to be in altering the 
ratios of existing features. 



5.2 Analysis of the radiative transfer models 

We now investigate whether the radiative transfer models 
discussed in Section [4] are capable of modelling the spectra. 
An example of the fit produced by 5 PCs and the radiative 
transfer models can be seen in Figure [4] 

We now compare all the xi f° r reconstructions using 5 
PCs with the xi f° r our radiative transfer model fits. Figure 
[5] shows the distribution of the reduced chi squared values 
for both the 5 PC reconstructions and the radiative transfer 
model fits, for all ULIRGs in the sample. A 5 component 
reconstruction fits the spectra better, on average, than the 
radiative transfer models. 




1000.0 



Figure 5. The xi values for each object in the sample for both 
radiative transfer model fits and the 5 PC reconstruction. Most 
objects do better with the PCs. 



We have shown that 5 PCs can explain the sample of 
ULIRGs better than the radiative transfer models, but the 
two are not competing methodologies. The PCs will always 
do better than the models as they are derived from the data 
and the number of PCs is increased until the reproduction 
of the spectra is good. They represent an extraction of most 
of the important information from the spectra. Radiative 
transfer models are used to give us physical information of 
objects. However, Figure indicates that the ULIRGs are 
not modelled well on average by the radiative transfer mod- 
els. 

By comparing the models to the PCs, we can investigate 
what information is in the PCs that is not in the models. 
Figure shows the contributions made by the five PCs, as 
a function of xi f° r the model fits. We only plot objects 
that have a reasonable xi for the 5 PC reconstruction i.e. 
a xi ^ 3. We also bin the model xi values into three bins. 
The mean and one sigma dispersion are overplotted as filled 
circles and errorbars. 

The general increase of xihiodeis m Figure [5] shows that 
models tend to do worse wh en the object s have a large, pos- 
itive contribution from PCl. lWang et af] (|201ll ) suggested a 
large, positive contribution from PCI indicated colder dust. 
Our results suggest that objects with colder dust are not 
well modelled by the AGN and starburst component mod- 
els. The increase in dispersion with XuModeia f° r PC2 and 
PC3 indicates models do worse when there is a large, abso- 
lute contribution from PC2 and PC3. PC2 and PC3 relate 
to strong spectral lines, which would indicate that the mod- 
els have problems with constraining the strength of spectral 
lines. Lower values of XvModeia appear to occur when objects 
have negative values of PC4, but as the xihiodeia values in- 
crease beyond 2, there appears to be little change in PC4. A 
negative contribution in PC4 would suppress emission fea- 
tures, indicating that models are again inadequate in mod- 
elling spectral features. There appears to be little change of 
PC5 contribution with xt Models- 

In Figure [7] we show the stacked difference between 
spectra and radiative transfer model fits (solid line) and the 
spectra and 5 PC reconstructions (dotted line). The stacked 
difference for spectra and models illustrates that model fits 
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Figure 6. The contributions made by each PC against the xi 
from the model fits. Only objects with a 5 PC fit of xi ^ 3 have 
been plotted. The mean and one sigma dispersion for three bins 
are overplotted as filled circles and errorbars 



underestimate the PAH spectral lines and do not include 
Neon fine structure lines, or molecular Hydrogen lines. The 
PAH underestimate is con sistent w it h our interpretation of 
Figure H I t suggests the iKruegell i2003h PAH treatment 
used by the ISiebenmorgen fc Kriieell (2003) starburst mod- 
els, is unsuitable for the extreme starforming ULIRGs. As 
expected, the PC reconstructions perform considerably bet- 
ter than the models. 
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Figure 7. The stacked difference between ULIRG spectra and 
best fit radiative transfer models (solid line) and the ULIRG spec- 
tra and 5 PC reconstructions (dotted line). The dot-dashed verti- 
cal lines mark the central location of the 6.2,7.7,8.6,11.2 and 12.7 
(im PAH emission lines. The dotted lines indicate the location of 
the molecular hydrogen lines at 9.66, 12.28 and 17.03 /an. The 
solid vertical lines indicate the position of the neon fine-structure 
lines, [Ne II] 12.8, [Ne v] 14.3 and [Ne III] 15.6 /um. 
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Figure 8. The position of ULIRGs in four of the PC planes 
(squares) and the position in PCA space of the corresponding 
best fit radiative transfer models (filled circles). Each ULIRG and 
best fit model are joined by a solid line. The arrows in the top 
left of each plot show the mean difference between ULIRGs and 
models. 



5.3 Interpreting the Principal Components 

We have shown that 5 PCs provide a simple empirical 
basis set that capture most of the important variations 
in ULIRGs. We have also shown some limitations of the 
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Figure 9. The contribution from each PC against the radiative 
transfer parameters of viewing angle and star burst /AGN contri- 
bution. The average contribution for three bins and associated 
one sigma dispersion are overplottcd. 



models. Nevertheless, the models still describe some of the 
physics of the objects and can be cautiously used to inves- 
tigate whether the components are associated with physi- 
cal parameters. We investigate the components by directly 
comparing the PC contributions and the radiative transfer 
model best fits for the ULIRG sample. Figure [9] shows the 
contribution from each PC as a function of the viewing angle 
and starburst/ AGN contribution. We have binned the PC 
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Figure 10. The contribution from each PC against viewing angle 
tracks and power ratio for 50 of the ULIRGs. For each best fit 
radiative transfer model, the viewing angle and power ratio have 
been varied to create tracks in PCA space. 



contributions and calculated the average and one sigma dis- 
persion for each bin. These are over-plotted with errorbars. 
PCI shows a correlation with viewing angle of AGN, with 
positive contributions corresponding to an obscured AGN 
and negative to face on AGN. The contribution from the 
fourth PC appears to drop with viewing angle from around 
j radians. The other PCs show no discernible dependence. 
The starburst/AGN contribution is plotted against PCs in 
the right hand side of Figure [9] Negative values of PC3 seem 
to be associated with AGN dominated sources and positive 
values with starbursts. The other PCs show a large amount 



8 P.D. Hurley et al. 



of dispersion and little correlation with starburst/AGN con- 
tribution. 

We now decompose the radiative transfer model fits into 
the PCA space described in Section [3] The position of each 
ULIRG (squares) in four of the PCA planes and correspond- 
ing best fit model (filled circles) can be seen in Figure[S] The 
mean difference between the ULIRGs and models is depicted 
by the arrow in the top right of each plane. 

We find the location of radiative transfer model best 
fits in PCA space are offset relative to the ULIRG positions. 
There are numerous explanations for the offset. The sparse- 
ness of the model library could be a factor. The decompo- 
sition into PCA space may also be affected by the missing 
physics in the models. We therefore treat the model tracks 
with caution and limit interpretation to relative changes in 
PC contribution rather than absolute position. 

We have taken the best fit radiative transfer model and 
vary each parameter in turn to see how it affects the position 
in PCA space. We focus on the viewing angle of AGN and 
ratio of starburst to AGN power, which we define as: 

itotal = pLsB + (1 - P)£AGN (1) 

A value p = describes a pure AGN model, and p = 1 
relates to a complete starburst. 

Figure [TO] shows the ID parameter tracks for 50 ran- 
domly selected ULIRGs. The viewing angle tracks show a 
decrease in PCI contribution when going from an obscured 
to face on AGN. Tracks in PC4 are curved, indicating a 
non-linear relationship with viewing angle. PC3 appears to 
be a good indicator for the power ratio, with PC3 contribu- 
tion decreasing as AGN power begins to dominate. Tracks 
in PC5 also show a slight correlation with power ratio, while 
for the other PCs the relationship is unclear. 

The interpretation of tracks is consistent with the con- 
clusions drawn from Figure [9] Certain PCs appear to be 
related to the physics of the ULIRGs. We have shown that 
PCI is linked to AGN viewing angle, while PC3 is linked to 
the star formation and A GN contribut i on. Th is is consistent 
with the interpretation of I Wang" et all (120111 ). 

5.4 Gaussian mixtures classification scheme 

Since we have shown the PCs capture most of the infor- 
mation in IRS sp ectra, it is natural to use the PCs as a 
classification tool. IWang et all (|201ll ) suggested that posi- 
tion in the PC1-PC4 plane was related to optical type. We 
now take take this one step further by proposing a classi- 
fication scheme based on optical classifications, using the 
multi-dimensional Ga ussian mixtures modelling applied in 
iDavoodi et alj (|2006f ). This type of parametric modelling 
works by assuming the density function of galaxies in our 
5D PCA space is composed of a mixture of multidimensional 
Gaussian functions. We take the four optical classifications 
(Seyfert 1, Seyfert 2, LINER and HII) that exist for 78 of 
our 119 ULIRG sample, and assume the density of objects in 
each classification can be described as Gaussian. The result- 
ing position and width of each Gaussian are trained from the 
optical classifications. They can be thought of as a proba- 
bility density function (PDF) that describes the probability 
of belonging to each optical classification, as a function of 
position in PCA space. 

Figure [11] shows the marginalised one sigma contours 
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Table 3. The percentage of objects in the four classifications as 
a function of their original classification. Not classified refers to 
those objects without an optical classification. 

for the optical classifications in four 2D projections. We note 
that the one sigma contours are for visualisation only, our 
classification scheme makes use of all 5 dimensions. The ob- 
jects with optical classifications are represented with differ- 
ent symbols: crosses for Seyfert 1, triangles for Seyfert 2, 
squares for LINER and open circles for objects classified 
as HII. Objects without an optical classification are plotted 
with a diamond. The success rate of our classification, can 
be found in Table [3] 

The classification scheme is very successful in correctly 
identifying Seyfert 1 like objects, while most of the Seyfert 
2s are classified correctly of as LINERs. The majority of 
LINER objects are correctly identified, while the majority 
of HII optically classified ULIRGs are spread across HII and 
LINER groups. Both the LINER and the HII classifications 
lie in similar areas of PCA space, and discrete classification 
for objects in this region may not be completely appropri- 
ate as many ULIRGs will show signs of both. Overall our 
5D Gaussian classification scheme works well in associating 
regions in PCA space with type of object and is a powerful 
tool in objectively classifying objects. 

We have used our classification scheme to classify the 
41 ULIRGs with no optical classification. The percentages 
can be seen in Table [3] We find the majority are HII and 
LINER objects while 12% are classified as Seyfert 2 like ob- 
jects. None of the objects appear to be Seyfert 1, suggesting 
optical classification of Seyfert 1 objects is complete. We 
now make use of our 5D Gaussian classification scheme by 
creating average spectra for our four classifications using all 
119 ULIRGs. Before averaging the spectra, each spectrum is 
normalised so that the mean flux over the whole wavelength 
range is unity. The resulting four average templates can be 
seen in Figure 1121 As expected, the HII and LINER tem- 
plates are similar, whilst Seyfert templates have very little 
PAH emission. 



6 CONCLUSIONS 

We have shown that five principal components are needed 
to describe most of the variation in the 119 local ULIRG 
sample and are more successful than a full \ 2 fitting by ra- 
diative transfer models. We have examined what the radia- 
tive transfer models are missing. The fits provided by radia- 
tive transfer models appear to need a cold dust component 
and have difficulty in modelling the strength of strong PAH 
emission lines. 

We have used a combination of the 
Sicbemriorg.cn fc Kriigell (l2007h starburst models and 
lEfstathiou fc Rowan-Robinsonl (|l995l ) AGN torus tem- 
plates to investigate what physical parameters are behind 
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Figure 11. Four out of the possible 10 2D projections for our PCA space with the one sigma contours for the gaussian mixtures based 
classifications. Optically classified Seyfert 1 objects are marked by crosses, Seyfert 2 by triangles, LINERs by squares and HII classified 
objects with open circles. Those objects without optical classification are marked by diamonds. 



the components. We have examined how best fit model 
parameters are related to PC contribution. Overall, our 
conclu sions are consistent with those reached in lWang et al.l 
|201ll) . Contributions from PCI appear to indicate the 
viewing angle of AGN with negative contributions associ- 
ated with face on AGN and positive for obscured AGN. 



PC3 appears to be the best indicator of whether it is the 
AGN or starburst that is the prevailing power source. 

The PCs consider a large part of the mid-infrared spec- 
trum and are therefore less likely to be affected by prob- 
lems associated with diagnostics based on single spectral 
features such as the PAH emission lines, where measuring 
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Figure 12. The average spectra for the four classifications. HII (29 objects), LINER (54 objects), Seyfert 2 (20 objects) and Seyfert 1 
(16 objects). The dotted lines represent the one sigma dispersion in each classification. 



line strength can be difficult. We suggest the five PCs would 
be useful as empirica l templates for ULIRG spectra in the 
IRS public database (|Lebouteiller et al"1l201ll ). 

We also introduce a new Gaussian mixtures classifica- 
tion scheme based on location in the five dimensional PCA 
space and trained via optical classifications. Objects can be 
classified as either Seyfert 1, Seyfert 2, LINER or HH-like. 
We note that any ULIRG with IRS spectra (in the relevant 
wavelength range) can be decomposed onto the PCs, and the 
position in PCA space can be used to classify the object. 

We have used our classification scheme to provide a 
set of average spectra for the four groups. We make these, 
the five PCs and code to classify objects available at: 
http : //www.phys . susx . ac.uk/~pdh21/PCA/ 
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